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Abstract —Active Sequential Hypothesis Testing (ASHT) is an 
extension of the classical sequential hypothesis testing problem 
with controls. Chernoff flj| proposed a policy called Procedure 
A and showed its asymptotic optimality as the cost of sampling 
was driven to zero. In this paper we study a further extension 
where we introduce costs for switching of actions. We show that a 
modification of Chernoff’s Procedure A, one that we call Sluggish 
Procedure A, is asymptotically optimal even with switching costs. 
The growth rate of the total cost, as the probability of false 
detection is driven to zero, and as a switching parameter of the 
Sluggish Procedure A is driven down to zero, is the same as that 
without switching costs. 

I. Introduction 

Active Sequential Hypothesis Testing (ASHT) is a general¬ 
ization of the classical sequential hypothesis testing problem 
where, at each observation instant, the decision maker has a 
choice that controls the type or quality of the observation. 
For example, in a cognitive radio setting, at each observation 
instant, the decision maker must select exactly one frequency 
band, of the several available, for observation. Another ex¬ 
ample is visual search where, at each time instant, one can 
focus only on a small subset of the entire visual field, and 
one must choose this subset for information gathering. ASHT 
can be used as a modeling tool for many other applications 
apart from visual search and cognitive radio, such as anomaly 
detection, medical diagnostics, etc. 

Chernoff HI studied ASHT in the context of designing 
optimal experiments. His performance metric was the total 
cost of sampling, which is propotional to delay, plus a penalty 
for false detection. Chernoff proposed a policy, the so-called 
Procedure A, and showed its asymptotic optimality as the cost 
of sampling went to zero. Procedure A maintains a posterior 
distribution on the set of hypotheses and, at each instant, 
selects actions according to the hypothesis with the highest 
posterior probability. 

In this paper we study a further extension of ASHT where, 
in addition to the average decision delay, we also penalize 
switching of actions. The current extension is motivated by 
visual search where a switch in action implies a change 
in the location of one’s focus and a fast movement of the 
eyes (a saccade), which has an associated biological cost that 
translates to a delay cost. 

We propose a modified Procedure A where the next action 
depends on the current posterior and the previous action, 
whereas in Procedure A the next action depends only on the 


current posterior. The modification is simple: at a given deci¬ 
sion instant, if an independently generated Bernoulli random 
variable turns up “1”, then the next action is taken as per 
Procedure A, else the current action is continued. We call this 
the Sluggish Procedure A. 

There has been a flurry of recent activity extending Cher¬ 
noff’s work in other directions. In a series of works, Naghshvar 
and lavidi G). 0. 0. 0. 0 studied ASHT from a Bayesian 
cost minimization perspective. The total cost was the sum 
of decision delay and a penalty for false detection. They 
proposed policies, similar to Chernoff’s Procedure A, iden¬ 
tified bounds on the total cost, and established their proposed 
policies’ asymptotic optimality in the same asymptotic regime 
as Chernoff’sM Nitinawarat et al. 0 studied active hypothesis 
testing in fixed sample size and in sequential settings. They 
also minimize decision delay subject to a constraint on the 
conditional probability of false detection. When these condi¬ 
tional probabilities of false detection are driven to zero, the 
resulting asymptotic regime is the same as Chemoff’s. In this 
asymptotic regime, they obtained results similar to those of 
Chernoff’s but under milder assumptions. They also prove a 
stronger asymptotic result based on the “risk associated with 
a decision”. Nitinawarat and Veeravalli 0 extended ASHT 
to Markovian observations and non-uniform costs on actions. 
Recently Cohen and Zhao 0 studied ASHT from an anomaly 
detection perspective. They showed that, in their setting, a 
simple deterministic policy was optimal. This is in contrast 
to random policies advocated in the other works. None of the 
above works consider costs associated with a change in action. 

Our contribution: We show that the aforementioned 
modification of Chernoff’s Procedure A, the Sluggish Proce¬ 
dure A, is asymptotically optimal even with switching costs. 
Further, we show that the growth rate of the total cost, as 
the probability of false detection is driven to zero, and as a 
switching parameter of the Sluggish Procedure A is driven to 
zero, is the same as that without switching costs. 

II. The ASHT Abstraction 

In this section, we describe our mathematical model for 
ASHT and collect all the relevant theoretical results. 

'They also consider the asymptotics where the number of hypotheses is 
large. This is not of direct relevance to our study. 


A. The ASHT Model 

1) The model description: Let us begin by setting up some 
notation. 

Let Hi, i = 1,2,..., M denote the M hypotheses of which 
exactly one, denoted H, holds true. We do not assume a prior 
on the hypotheses. Let A be the set of all possible actions 
which we take as finite: .4 = K < oc. Let X be the observa¬ 
tion space. Let (X„) n >i and (A n ) n >i denote the observation 
process and the control process respectively. We write X n for 
(Xi,..., X n ) and similarly A n for (Ai, ..., A n ). We also 
write V(A) for the set of probability distributions on A. 

A policy 7r is a sequence of action plans that at time n 
looks at the history X n ~ 1 ,A n ~ 1 and prescribes a composite 
action that is either (stop, S) or ( continue, A) as explained 
next. If the composite action is (stop, <5), then the controller 
stops taking further samples (or retires) and indicates 6 as 
its decision on the hypothesis; 6 £ {1,2, If the 

composite action is (continue, A), the controller picks the next 
action A n according to the distribution A £ 'P(A). Let t(tt) 
be the stopping time 


Let E, denote the conditional expectation and let Pi denote 
the conditional probability measure under H = Hi. (More 
formally, these should be represented and P?. But as done 
above, we omit the superscript tt.) 

Given an error tolerance vector a = (ai, a%,... , ccm) with 
0 < cii < 1, let 11(a) be the set of policies 

n(a) = {tt : Pi(d ^ i) < a i; V *} . 

These are policies that meet a specified tolerance for the 
conditional probability of false detection. We define ||a|| := 
max.; OLi. 

We define A; to be the best mixed action that guards Hi 
against its nearest alternative, i.e.. A; £ V(A) such that 


A i := arg max 
a ev ( A ) 


. 3 * at4 


( 2 ) 


If there are several maximizers, pick one arbitrarily. Further, 
define 


t(tt) := inf{n > 1 | A n = (stop, •)}. 

Consider a policy n. Conditioned on action A n and the true 
hypothesis H, we assume that X n is conditionally independent 
of previous actions A" -1 = (A\, A 2 ,..., A ra _i), previous 
observations X n_1 = (Xi, X 2 , ■ ■ ., X„_i), and the policy. 
Let Qi be the conditional probability density function, with 
respect to some reference measure p, of the observation X. n 
under action a when H = Hi. Let D(g“||q“) denote the 
relative entrop)0 between the conditional probability measures 
associated with the observations under hypothesis Hi and 
under hypothesis Hj, upon action a. Denote by unif(X) the 
uniform distribution on A. Let qUx n ,a n ) be the proba¬ 
bility density function of observations and actions (x n ,a n ) 
till time n, with respect to the common reference measure 
p® n x unif(A) 0rl . Let Z£ ( n ) denote the log-likelihood process 
of hypothesis Hi, i.e., 

( n ) = log Qi (X n , A n ). (1) 


Going forward, for ease of notation, we drop the superscript 
tt while describing q* , , and other variables, but their 

dependence on the underlying policy should be kept in mind, 
and the policy under consideration will be clear from the 
context. Define Z(n) = (Z\(n), Z^(n),... ,Z M (n ))■ Let 
Zij(n) denote the log-likelihood ratio (LLR) process of Hi 
with respect to Hj, i.e.. 


Zij(n) = Zi(n) - Zj(n ) 


q l (X n ,A n ) 
q,(X:",A n ) 


= ^!°g 
1=1 


g t l (*i) 

g f l (XiY 


2 By an abuse of notation, we use the densities of the probability measures 
as the arguments of the relative entropy function. 


Dj := max 
xev ( A ) 


min A (a)£> 

aGA 


(3) 


Let Aij = {o £ A : ||< 7 “) > 0}, the set of all actions that 

can differentiate hypothesis Hi from hypothesis Hj. From well 
known properties of relative entropy, we obtain Aij = A p . 

2) Assumptions: Throughout, we make the following as¬ 
sumptions. 


(I) 



<00 Vi, j, a. 


(Ha) Aij 0 Mi,j i^=j. 

(Hb) P ■= min^ fc £ ae-A .. A k (a) > 0. 

Assumption (I) implies that D(g“||g“) < 00 , which in turn 
ensures that no single observation can result in a reliable 
decision. Assumption (I) is used in proving the lower bound on 
the expected number of samples needed to satisfy the tolerance 
criterion. This is also assumed by Chernoff Q] and Nitinawarat 
et al. Q- 

Assumption (Ha) ensures that for any distinct i and j, there 
is at least one control that can help distinguish the hypotheses 
Hi from Hj. If Aij = 0 for some i and j, it will be impossible 
to distinguish them from each other. Assumption (lib) is 
a stronger assumption than, and implies. Assumption (Ha). 
Assumption (lib) ensures that if actions are taken according 
to any of the Afc in (j2j then, for any two hypotheses Hi and 
Hj, there is a positive probability of choosing an action that 
can discriminate them. We shall use Assumption (lib) in the 
achievability proofs of our policies. It allows for easier proofs 
for our policies, and makes the presentation simpler. However 
one can work with Assumption (Ha) as well, and construct 
asymptotically optimal policies, with minor modifications to 
our policies. We will describe the modifications later in this 
section. 

3) Switching cost and total cost: The costs are as follows. 












Switching Cost: Let g{a , a') denote the cost of switching 
from action a to action a!. We assume 


g{a,a')> 0 Va, a' gA and g(a,a) = 0. 

Define g max = max a a / g(a,a'). We also assume g max < oo. 

7bfa/ cost: For a policy 7r G 11(a), the total cost C(tt) is 
taken to be the sum of the stopping time (delay) and the net 
switching cost, i.e., 

t( 7r) —1 

C(tt) ■= t(tt) + ^ 

i=i 

4) Asymptotics: We shall be interested in the asymptotics 
of the minimum expected total cost Ei[C( 7r)], minimized over 
policies in 11(a), as ||a|| —> 0. Note that there are M such 
conditional expected total costs, one for each hypothesis. 


B. Results on the ASHT Model 

We collect all the main results in this section. We first 
identify a lower bound. 

1) The converse - Lower bound: The following proposition 
gives a lower bound for the expected conditional stopping 
time, given hypothesis H = Hi, for all policies belonging 
to 11(a). 


Proposition 1. Assume (I). For each i, we have 

lim inf > -L (4) 

INI —*-0 7ren(a) |log||a||| A 

where Di is given in 0. 

Proof: Since only expected time to stop is considered, 
proof of JT] Th. 2, p. 766] applies. ■ 


We then have the following corollary. 

Corollary 2. Assume (I). For each i, we have 

lim inf ^ C ^ 7r j] > 

|| ck|| —>-0 7r£n(a:) | log 11 Of 11 | Di 

Proof: With switching costs added, we have C(it) > 
r(7r), and the corollary follows from Propostion [I] ■ 


(5) 


We now describe a modified policy that can be made as close 
as one wishes to being asymptotically optimal in the presence 
of switching costs. We introduce a switching parameter V,0< 
V < 1, which determines the maximum transition rate out of a 
given action. When p = 1, we will have the original Procedure 
A. When rj approaches zero, the rate of jumping out of the 
current action approaches zero. 

Policy Sluggish Procedure A: ttsa(L,p) 

Fix L > 0, 0 < t] < 1. 

At time n: 

• Let 6{n) = arg nm, Zi(n). Ties are resolved 
uniformly at random. 

• If Z 6 ( n ) tj {n) < log ((M - 1 )L) for some j 
9(n) then A n+1 is chosen as follows. 

- Generate f/„+i, a Bernoulli)//) random 
variable, independent of all other random 
variables. 

- If U n+ 1 = 0, then A n+1 = A n . 

- If U n + 1 = 1, then generate A n+ 1 accord¬ 
ing to distribution A#(„). 

• If Z 0MJ (n) > log (M — 1)L, for all j ^ 

8(n), then the test retires and declares A(n) 
as the true hypothesis. 

We also consider two variants of ttsa(L, rf) which are useful 
in the analysis. 

• Policy t: 1 sa (L, //): This is the same as nsA{L,r]), 
but stops only at decision i when min j^Z^^n) > 
log(L(M-l)). 

• Policy ttsa{v)'- This is the same as nsA{L,rj), but never 
stops, and hence L is irrelevant. 

Under a fixed hypothesis H = Hi, and the triplet of policies 
(■ ^sA(L,ri),TTg A (L,ri),TtsA( r l )), it is easily seen that there 
is a common underlying probability measure with respect to 
which the processes (X n , A n ) n > i associated with the three 
policies are naturally coupled, with only the stopping times 
being different. Under this coupling, the following are true: 


2) Achievability - A modification to Chemojf’s Procedure 
A: Chernoff III proposed a policy termed Procedure A and 
showed that it has asymptotically optimal expected decision 
delay. We now describe Procedure A. 

Policy Procedure A: ttpa{L) 

Fix L > 0. 

At time n: 

• Let 6{n) = arg max., Ziff). Ties are resolved 
uniformly at random. 

• If z e( n ),j{n) < log ((M - 1 )L) for some j ^ 

6(n) then A n +1 is chosen according to A#( n ), 
i.e., 

Pr(A„ + i = a) = Xe( n )(a) (6) 

• If z 0 (n),j{n) > log ({M — 1)L) for all j ^ 

6(n) then the test retires and declares as 

the true hypothesis. 


t (f 1 S a( l ,v)) > i-(n sa{L,t])), 

{i~{ttsa(L, rj)) > n} C {T(n l SA (L,r))) > n} 

C {min Zij(n) < log (L(M — 1))}. 
jAi 


Policy nsA{L,r]) is designed to stop only when the poste¬ 
riors suggest a reliable decision. This is formalized now. 

Proposition 3. Assume (I) and (lib). For Policy "sa (C- rj), 
the conditional probability of error under hypothesis Hi is 
upper bounded by 


Pi(d 7 


(7) 


See Appendix |IV-A for a proof. As a consequence we have 
nsA{L,rj) € 11(a) if a,; > A Vi. We now state the time- 


delay performance of the policy ■nsA{L,rj). 







Theorem 4. Assume (I) and (lib). Consider the policy 
ttsa(L,p). The expected time to make a decision, for each 
i, satisfies 


lim Ej [t(ttsa{L, g))] 
L—yoo log L 



( 8 ) 


See Appendix IV-B for a detailed proof. This result will be 
crucial because the policy n sa{L,p), despite its sluggishness 
induced by //, remains asymptotically optimal when only the 
stopping time t(ttsa{L,p)) is considered as cost. We now 
leverage this to show that, if g is sufficiently small, ttsa(L, ij) 
is near optimal when switching costs are also taken into 
account. 


Proposition 5. Assume (I) and (lib). Consider the policy 
ttsa{L,p). We then have, for each i, 


lim Ei 
L—foo 


C(ttsa(L,p)Y 
log L 



9 max^7 

A 


(9) 


Proof: We can write the following chain of inequalities. 


Ei[C{it 8 A{L,ri))] 


= Ei 


r(n S A(L,p)) + E g(Ai,Ai +1 ) 
1 = 1 


< Ei [T(irsA{L,Tl))] + gmaxA 

< Ei [T(nsA(L, 77))] + Jmaxft 


7 )) — 1 

1=1 

t(ttsa( l V 7))- 1 

E U 


1=1 

= Ei [t(ttsa(L,t)))] + 

pmax VEi [t(tvsa(L, 77 )) - 1 ] 

< Ei [t(tisa(L, 77 ))] (1 + «)■ ( 10 ) 


In the above chain, the penultimate equality holds because of 
Wald’s equation ED- Dividing by log L , letting L —> 00 , and 
using Theorem [4] we see that <0 holds. ■ 

3) Asymptotic optimality: Proposition |T| and Proposition [5] 
show that, when the conditional probability of false detection 
is driven to zero, the proposed policy 7 tsa(L, g) has nearly the 
same growth rate for cost as an asymptotically optimal policy 
without switching costs. We now make the above statement 
precise. The parameter 77 should be suitably chosen to get 
sufficiently close to asymptotic optimality. 


Theorem 6. Assume (I) and (lib). Consider a sequence of 
vectors (a^) n > i> where a (") is the n th tolerance vector, 
such that linin^oo ||a4 n )|| = 0 and 


lim 

n—loo 


( n ) 

mm fc a k 


< B 


( 11 ) 


for some B. Then, the sequence of policies nsA(L n ,p ) with 
log L n = — log minfc ot' k ' > belongs to n(al"l). Furthermore, 
for each i. 


lim inf lim lim 

™t°° l 6 n(n(")) login *7-1-0 nfoo log L n 


1 

A' 

( 12 ) 


Proof: The fact that TTsA(L n ,p) € n(o;l n l) is evident 
from Proposition |3j and jp- < a k , k = 1,2,• • • , n. We then 
have the following chain of inequalities: 


2- < lim inf 

Di ntoo re n(a(")) I log ||al n l|| 


Ei [C(^r)] 

nfoo 7 rGlI(a( n )) log L n 

Ei [C{TT S A(L n ,p))\ 


= lim inf 

nf 00 TrgI 

< lim lim 

774 .O n^oo 

1 

< —. 

“ Di 


log L r 


The first inequality follows from Proposition [M The next 
equality follows from the fact that limn^oo ^ lo E 4 —“ = !■> 
which in turn is true due to the assumption ( fTlf . The third 
inequality follows because 7 TsA{L n ,p) is one specific policy 
in II( a n ). The last inequality follows from Proposition [5] after 
letting 77 4- 0. Consequently, all inequalities must be equalities. 


C. Discussion on Assumption (lib) 

Chernoff’s proof of the asymptotic optimality of Procedure 
A was proved under a stronger assumption than Assumption 
(lib), namely, Chernoff required 


D(q < i\\q C j) >0 Va and for all pairs i j. (13) 


Assumption (lib) ensures that, at all times, and for any pair 
of hypotheses i and j, i 7 ^ j, there is a positive probability 
of choosing an action that can distinguish the two hypotheses. 
This suffices for Chernoff’s proofs to go through. Specifically, 
we shall use Assumption (lib) to prove the exponential decay 
result in Proposition 11 Nitinawarat et al. 0 proposed 


a modified Procedure A that sampled actions randomly at 
intervals \v l ~\ l>v v > 1 , and showed that their proposed 
policy is asymptotically optimal under the weaker Assump¬ 
tion (Ila). The random sampling enabled them to obtain a 
polynomial decay counterpart of Proposition [TT| of Appendix. 
Recently, Cohen and Zhao a claimed the asymptotic opti¬ 
mality of Procedure A under the weaker Assumption (Ila) 
for an active anomaly detection problem, which is a specific 
ASHT problem. We conjecture that Chernoff’s Procedure 
A is asymptotically optimal under the weaker Assumption 
(Ila) for all ASHT problems. A proof of this claim has 
remained elusive. Nevertheless, policies whose performances 
are provably arbitrarily close to the optimum can be designed. 
We make the above claim precise in the next proposition. 


Proposition 7. Assume (I) and (Ila). Fix e > 0. Then there 
exists a sequence of policies { 7 T e (L)} that satisfies it e (L) £ 

>i) and 


lim Ei 
L—f 00 


log L 


1 

- (l-e)ZV 


(14) 


We omit the proof because the needed modifications to the 
proof of Theorem |4] are straightforward. Policy {n e (L)} can 
be constructed as a variant of Procedure A that, at each instant 


























n, chooses an action according to unif(„4) with probability e or 
as per (|6| with probability (1 — e). Thus, at the cost of a small 
penalty, we can design nearly asymptotically optimal policies 
under the weaker Assumption (Ha). A similar argument holds 
true with switching costs, just as Theorem [4] is extended in 
Theorem [6] albeit with a corresponding but arbitrarily small 
increase in the total cost. Again, we omit the proof of this 
claim. Hence Assumption (Ha) suffices for the asymptotic 
growth rate to be jf. 

III. Conclusion 


Proof: The following sequence of inequalities hold: 


Ei 


s 'AZj*W\ An = a 

Qj( x ) 


, v a a (r) I 9i(x) d X 
lx ex \Qi \ x ), 

{Qi(x)) 1 ~ a dx 


IxGX 


< 


q°j{x)dx 


'xGX 


qHx)dx 


'xGX 


( 16 ) 


= 1. 


We studied active sequential hypothesis testing (ASHT) 
with switching costs. We proposed a modification to Cher- 
noff’s Procedure A that can be made to approach the asymp¬ 
totic performance of Procedure A. The proposed algorithm 
merely slows down the switching of actions via an i.i.d. 
Bernoulli modulation process. The growth rate of total cost, 
as the probability of false detection is driven to zero, and as 
the switching parameter tj is driven to zero, is the same as that 
without switching costs. 
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Appendix 


IV. Properties of log-likelihood ratio processes 

UNDER 7 T S a(L,T]) 


We will now show some desirable properties of the log- 
likelihood ratio processes under the policy itsa{LtV)- These 
properties are analogous to those of classical sequential hy¬ 
pothesis testing, but their analyses are more involved because 
actions introduce 1) dependency in the log-likelihood ratio 
increments, and 2) the increments are no longer identically 
distributed. The properties we will establish will be useful in 
forthcoming proofs. 


Define A Zjfn) = Zjfn) — Zji(n — 1). We then have 
A Zji(n) = —A Zij(n). Here, A Zji(n) is the increment in 
the process associated with the log-likelihood ratio of Hj with 
respect to Hi at time n. We now show that under Assumptions 
(I) and (lib), and under policy ttsa(L,v), the log-likelihood 
ratio processes are well behaved in the following sense: the 
log-likelihood ratio of the true hypothesis Hi with respect to 
any other hypothesis Hj has a positive drift. This will be made 
precise in Proposition 11 Towards that, we first establish the 
following lemmas. 


Lemma 8. Assume (I) and (lib). Fix i, j such that j f i. Let 
a G Aij. We then have, for all 0 < s < 1, 


Pij(s) ■= Ei 


sAZji(n 


'K = 


< 1 Vn. 


(15) 


The strict inequality in ( fl 6 | ) follows from Holder’s inequality 
and the fact that a £ Aij implies q°' and q" are not linearly 
related. ■ 

The above result was obtained by conditioning on the action 
A n to lie in the desirable set A r) . The result is independent of 
the underlying policy, because when conditioned on the current 
action A n , the observation is independent of the policy. 

Recall that 715 ^( 77 ) is the non-stopping variant of 7 tsa(L, rj). 
Further, recall from Assumption (lib) that we have f = 

min {EaeAij M a ) I 1 < *) j, k < M, i ^ j j > 0 . Now we 

show that, under Assumption (lib) and policy 1:3 a (if, a 
similar result holds, but without conditioning on the action 
A n . First, let us define 

Pij(s) := V/3 ( max (s)) + (1 - q/3). (17) 

\aeAij J J 

The fact that Pij(s) < 1 is evident from Lemma [ 8 ] 

Lemma 9. Assume (I) and (lib). Consider the policy ttsa (v)- 
Fix i. We then have, for all 0 < s < 1, 


Ei 




>|A 


n —1 1 


< Pij(s) < 1 


Vn, Mj ^ i. 


Proof: The following sequence of inequalities hold as 
described after the last inequality. 


Ei 


^sAZji(n) |^ n— 1 1 


= Ei Ei 


[.Ei e sAz ^ n) \X n - 1 ,A n ~ 1 ,A n \X n ~ 1 ,A n ~ 1 
= P Mn = a\X n ~ 1 A n ~ 1 )Ei \e sAz ^\A n = a 


aGA 


(18) 

< Pi(A n G A tJ \X n - 1 A n - 1 ) max E t \e sAz ^ n) \A n = a 


+ (l-Pi(A n £ 

< V/d (max p“j(s) \ + (1 — 77/3) 

< 1. 


(19) 


( 20 ) 


Equality (18 1 holds because conditioned on A n = a , AZy(n) 
is independent of the remaining history. Inequality ( fl9| ) holds 
because, when a ^ Aij, we have AZy(n) = 0. The 
penultimate inequality is a consequence of the fact that, under 
ttsa(L, v), one will choose an action a G A t j with probability 
at least tj/3. ■ 



















We now proceed to show an inequality analogous to the 
Chernoff bound for the log-likelihood ratio. In classical se¬ 
quential hypothesis testing, due to independence of samples 
across time, the expectation of the likelihood ratio can be 
split as the product of the expectation of the likelihood ratio 
increments, as follows: 


Et 


sZjiin) 


= II E - 

fc=l 


3 sA Zji(n) 


The same decomposition is not valid in ASHT because actions 
introduce dependency in the likelihood ratio increments across 
time. However, we can obtain an upper bound of the product 
form. 


where ma Xj^ i p i j(s) = e -7 , and Ck = Me sK . The inequal¬ 
ity in (|22| is due to the union bound, the inequality in ( |23| is 
due to Chernoff’s bound with 0 < s < 1, and the inequality 
in ( |24| ) is due to Lemma [T0| ■ 

We now show that under the hypothesis H = Hi, the 9(n) 
process eventually settles at i. Indeed we show something 
stronger. Let us define 

Tj := inf{n : 9{n') = i, \/n' > n}, (25) 

the time at which 9{n) meets its eventuality of settlement at i. 
This random variable has a tail that decays exponentially fast, 
as shown next. 


Lemma 10. Assume (I) and (lib). Consider policy ttsa ( r l)- 
Fix i. We then have, for all 0 < s < 1, 


Ei 


,sZji{n) 


<{Pij(s)) n \/n,\/jf=i. 


Proof: Once again, we proceed through the chain of 
inequalities all of which are now self-evident: 

0 sZji(n) 


E, 


= Ei 
= Ei 


ft. ^Zjiin-l) e sAZji(n)^n-l j^n-l 
e sZji(n- 1) e 8AZji(n)|j£-ra-l^n-l 


= Pij(s)Ei 

< ( Pij(s)) r 


0 sZji(n- 1) 


(from Lemma [9) 


where the last inequality follows by induction. ■ 

We now show an exponential decay property of the log- 
likelihood process which primarily stems from the anticipated 
negative drift in Zjfin ) for j f i. Let us alert the reader that 
in the following Proposition we deal with Zij{n) = —Zji(n). 

Proposition 11. Assume (I) and (lib). Consider policy 
7 tsa{v)- Fix i. There exist constants Ck > 0 and 7 > 0 
such that 


Lemma 12. Assume (I) and (lib). Consider policy 7 tsa(ji)- 
Fix i. Then there exist C > 0 and b > 0, both finite and 
possibly dependent on i, such that 

P t (Ti >n)< Ce~ bn . (26) 


Proof: By the union bound 


Pi {Ti > n) = Pi{9(n') i for some n' > n) 


<J2 P * ^ n ') * 0 

n'>n 


< 


n'>n 


^min Zij{n r ) < . 


The assertion now follows from Proposition 0 

Thus far we have considered the policy nsAiv) which never 
stops. We now show that the policy ttsa{L,p) stops in finite 
time. 


Proposition 13. Assume (I) and (lib). Consider the policy 
irsA(L,ri). Fix i. We then have 


Pi{r{TrsA{L,r])) < 00 ) = 1 . 


P % ^mmZy(n) < Kj < C K e~^ n . 
Ck is independent of i, but 7 may depend on i. 


(21) Proof: We consider n l SA {L,rj) for analysis. Recall that 

T{n S A{L,r])) < T{TT l SA {L,r])), and hence it is sufficient to 
show that 


Proof: This follows from the previous lemmas via the 
following : 


Pi ^lid 11 Z t j (n) < Is 'j = Pi ^max Zj t (n) > —A' 

<J^P t (Zp(n)>-K) 

iAi 

<J2e sK Ei 


0 sZji(n) 


iAi 

<e BK Y,M*)) r 


iAi 


( 22 ) 


„sK 


< e 

= C K e~^ n , 


(M - 1) • ma x(pij(s)Y 


p i( T ( n S A ( L ,ri) < 00 ) = L (27) 

From Proposition |TT] we know that, for a suitable constant C, 

P t ( r 


min Zij{n) < log (L{M — 1)) ) < Ce in . 


(23) Since this bound is summable, by the Borel-Cantelli lemma. 


(24) 


Pi ^niin Zj : j(n) < log (L(M — 1)) infinitely oftenj = 0, 

which is stronger than the assertion ( |27] >. ■ 

Propositions [TT| and 13 are the ones that will be used in the 
sequel. 




















A. Proof of Proposition [i] 

The proof relies on a standard change of measure argument. 
Let A j denote the event that the policy -5 4 (L , if) declares Hj 
as the true hypothesis. 


Pi(S J^i) = £P# = j) + Pfr (ns a (L, rj)) = 00 ) 


-EE/ 4 

j^tin>0 Ju 


dPi(u n ) + 0 


-EE/ „ 

'cj n eAj ar 3 


j^Li n>0 1 

A EE 

jzfii n>0 1 


1 


Wa, (M - 1 )L 


dPj(u} n ) (28) 


1 

< -. 
- L 


The equality in the second step is valid as we have shown in 
Proposition [13] that the stopping time is finite with probability 
1. The inequality ( [28] ) follows because under H = H- r to n £ 
A j implies Zji(rf> log((M — 1 )L), that is, j^(u n ) < 

_ 1 _ ■ 

(M— 1 )L' 


B. Proof of Theorem [7J Achievability 

We assume (I) and (lib). All statements in this proof are 
under H = Hi and under Sluggish Procedure A. We follow 
the proof technique of Chernoff jT] Lem. 2], Chernoff’s proof 
technique does not go through completely because unlike in 
Procedure A, the next action in Sluggish Procedure A is not 
conditionally independent of the previous action, given the 
current likelihood values. A similar issue was addressed by 
Nitinawarat and Veeravalli in ( 8 ), and we will adapt their proof 
technique to our setting. 

Let us first setup some notation. Fix e > 0. Define 

Dij :=£>(a)D(tflltf), 

a£A 

where \ is as defined in 0 Let Di be as defined by 0 
i.e., Di = Dij. Under the Sluggish Procedure A, the 

transition probability matrix TP{6(n )) of the action process 
A n at time n is given by 


TP(9(n)) = (1 - 77)1 + n (lA^j). (29) 

It is easy to verify that the stationary distribution associated 
with TP(Q(nf) is A§(„). Define Pk-i '■= cr(X fc_1 , A k ~ 1 ), the 
er-field generated by the random variables (X k ~ 1 , A k ~ 1 ). 

We now upper bound the expected time to make a decision 


under Sluggish Procedure A as follows: 


Ei [T(TT S A{L,ri))\ < Ei \t(tt 1 sa (L, 77 ))] 

= E Pi > n) 

n> 0 

< (1 + e) log(L(M - 1 )) 

Di 

+ E Pi ( t (^sa( l , V)) > n) , (30) 

ri>n 


where 


(1 + e) \og(L(M — 1)) 
Di 


To complete the proof, we will now show that for any e > 0, 
the second term on the right-hand side of (30) goes to zero 
as L —>• 00 . Let us first analyse a term in the summation. We 
claim that each term decays exponentially with n. So the tail 
sum vanishes as L —> 00 , because n —¥ 00 . This suffices to 
complete the proof of Theorem [4] 

We now proceed to prove the claim. Observe that 


p i { T ( n SA( L i 7 l)) > n) 

< Pi ^min Zij{n) < log (L(M - 1))^ 

< E Pi < log(L(Af - 1))). 

iAi 

Fix one j f i. (The same analysis holds for other j.) Then 


Pi (Zij(n) < \og(L(M — 1))) 

/ n ^ 

= Pi ( E Az a(k) < log(i(M - 1)) 


\k—l 


= Pi E ( A ^-( fc ) - E i [A^-(fc)| Pk-i] + e') 

\k =1 

n 

+ (Ei [AZij(k)\J r k-i] — Dij + e r ) 

k =1 

+ n (D i:j - 2e) < log (M - 1)L j 

/ n \ 

< Pi E ( AZ u( k ) - E * l AZ v(k)\P k -i] + e') < 0 


Kk=l 


+ Pi E i AZ ^k )| J-fc-i] - D^ + e') < 0 


\k= 1 


+ P z (n(Dij - 2e) < log (L(M - 1))). 


(31) 


Look at the first probability term in ( fTi~| . Each entry within the 
summation has a positive mean and, from Chernoff’s bounding 
technique in JT] Lem. 2 ], there exists a b(e') > 0 such that 

p i ^E ( A2 tf(*0 - E * i AZ ij ( k )+ O < oj < e~ nb ^'\ 

The third probability term is 0 if we choose an e' small 
enough such that n(Dij — 2 e') > log(L(M— 1 )), for all n > ft. 








Indeed, any e' satisfying 0 < e' < su ffi ces - So set 


, = _j_m. 
E l+e 4 ■ 


We now proceed to show that the second term also decays 
exponentially to zero. Let T) be as defined in ( [25| . For a 
suitably chosen e", and we will soon indicate how to choose 
it, we have 

P, (^2 ( E * [A^(fc)| - D l:) + e') < oj 

< Pi ( J2 l AZ a(k) |-A-i] - Aj + e') < 0, Ti < ne" 

\k—l 

+ P l (T l >ne"). 

From Lemma [12] the second probability term on the right- 
hand side decays exponentially with n. To show that the first 
probability term on the right-hand side decays exponentially 
with n, we use a technique of Nitinawarat and Veeravalli f 8 ] 
(6.23)]. 

First, we indicate how to choose e". Define 

C = min Ei [AZij(k)\A k = a] - Aj 

aeA 

= mmD(q°\\q a ) - D tj . 
aeA J 

Since Aj is the Ai-weighted average of -D(<z“||< 7 “). we have 
C < 0. Choose e" small enough so that e := e' + e"G > 0. 
We then have 

A ( J2 ( E i [A^-(fc)|A-i] - Aj + O < 0, A < ne" ] 


Kk =1 


IV'j 


= Pi X! ( El i Az ij(k)\Ek-i\ - Aj + o 


k =1 


+ ^2 (Ei[AZij(k)\J r k-i\ — Dij + e') < 0, 

k— \ne" \ +1 

T r < ne" ] 


for some C > 0 and some 6 (e) > 0. The second inequality 
follows from the fact that C < A [AZij(k)\Tk-i\ — Dij . for 
all k. The third inequality follows from the choice of e and 
the fact that 

[ne ,, J(C' + e) + (n — Lne ,, J)e / > (n — |_ ne"\)e. 

Pi is a new measure under which actions are taken according 
to Sluggish Procedure A but assuming 9(n) = i Vn, and 
the observations are conditionally independent of past obser¬ 
vations and actions, given the current action. Consequently, 
| under Pi, the action process A n is a stationary Markov 
Chain with transition probability matrix TP(i). By the ergodic 
theorem and concentration inequalities for Markov Chains 
CD, this term also decays exponentially with n, which is (| 32 [>. 
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